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Abstract — The enormous increase in the usage of communica- 
tion networks has made protection against node and link failures 
essential in the deployment of reliable networks. To prevent loss 
of data due to node failures, a network protection strategy is 
proposed that aims to withstand such failures. Particularly, a 
protection strategy against any single node failure is designed for 
a given network with a set of n disjoint paths between senders 
and receivers. Network coding and reduced capacity are deployed 
in this strategy without adding extra working paths to the readily 
available connection paths. This strategy is based on protection 
against node failures as protection against multiple link failures. 
In addition, the encoding and decoding operational aspects of 
the premeditated protection strategy are demonstrated. 

I. Introduction 

With the increase in the capacity of backbone networks, 
the failure of a single link or node can result in the loss of 
enormous amounts of information, which may lead to catas- 
trophes, or at least loss of revenue. Network connections are 
therefore provisioned such that they can survive such failures. 
Several techniques to provide network survivability have been 
introduced in the literature. Such techniques either add extra 
resources, or reserve some of the available network resources 
as backup circuits, just for the sake of recovery from failures. 
Recovery from failures is also required to be agile in order 
to minimize the network outage time. This recovery usually 
involves two steps: fault diagnosis and location, and rerouting 
connections. Hence, the optimal network survivability problem 
is a multi-objective problem in terms of resource efficiency, 
operation cost, and agility [13]. 

Network coding allows the intermediate nodes not only 
to forward packets using network scheduling algorithms, but 
also encode/decode them using algebraic primitive operations, 
see [1], [5], [6], [12] and the references therein. As an 
application of network coding, data loss because of failures 
in communication links can be detected and recovered if 
the sources are allowed to perform network coding opera- 
tions [11]. 

In network survivability, the four different types of failures 
that might affect network operations are [10], [14]: 1) link 
failure, 2) node failure, 3) shared risk link group (SRLG) 
failure, and 4) network control system failure. Henceforth, 
one needs to design network protection strategies against these 
types of failures. Although the common frequent failures are 



link failures, node failures sometimes happen due to burned 
swritch/router, fire, or any other hardware damage. In addition, 
the failure might be due to network maintenance. However, 
node failure is more damaging than link or system failures 
since multiple connections may be affected by the failure of 
a single node. 

Recently, the authors have proposed employing the network 
coding technique in order to protect against single and multiple 
link failures [2], [4], [8], in a manner that achieves both 
agility and resource efficiency. The idea is to form linear 
combinations of data packets transmitted on the working 
circuits, and transmit these combinations simultaneously on 
a shared protection circuit. The protection circuit can take the 
form of an additional p-cycle [7], [8], a path or a general tree 
network [9]. In the case of failures, the linear combinations 
can be used by the end nodes of the connection(s) affected by 
the failure(s) to recover the lost data packets. These network 
protection strategies against link failures using network coding 
have been extended to use reduced capacities instead of 
reserving, or even adding separate protection circuits [2], [4]. 
The advantages of using network coding-based protection are 
twofold: first, one set of protection circuits is shared between 
a number of connections, hence leading to reduced protection 
cost; and second, copies of data packets are transmitted on the 
shared protection circuit after being linearly combined, hence 
leading to fast recovery of lost data since failure detection and 
data rerouting are not needed. 

In this paper, we consider the problem of providing pro- 
tection against node failures by the means of network coding 
and the reduced capacity techniques. As a byproduct of this 
protection strategy, protection against any single link failure 
is also guaranteed. This is based on representing the node 
failure by the failure of multiple links. However, the failed 
links are not any arbitrary links. Since working paths used by 
the connections that are protected together are link disjoint, 
the links that need to be protected are used by different 
connections. 

II. Network Model 

The following points highlight the network model and main 
considerations. 
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Fig. 1. A Network M with a set of nodes V and a set of edges E. The 
nodes V consist of sources S, receivers R, and relay nodes V. The node TI5 
represents a failed node with 3 working connections that must be protected 
at the failure incidence. 



Let Af be a network represented by an abstract graph 
G = (V, E), where V is the set of nodes and E be set 
of undirected edges. Let S and R be sets of independent 
sources and destinations, respectively. The set V = V U 
S U R contains the relay nodes, sources, and destinations, 
respectively, as show in Fig.[T] Assume for simplicity that 
I S I = \R\ = n, hence the set of sources is equal to the 
set of receivers. 

A path (connection) is a set of edges connected together 
with a starting node (sender) and an ending node (re- 
ceiver). 



Li = {(s u w u ), (w u ,w 2 i) > ■ {w (m)l ,r % )}, 



(1) 



where 1 < i < n, (w(j^i)i,Wji) G E, and +ve integer 
m. 

• The node can be a router, switch, or an end terminal 
depending on the network model Af and the transmission 
layer, see Fig. [2] 

• L is a set of paths L = {L\, L2, ■ ■ ■ , L„} carrying the 
data from the sources to receivers. Connection paths are 
link disjoint and provisioned in the network between 
senders and receivers. All connections have the same 
bandwidth, otherwise a connection with a high bandwidth 
can be divided into multiple connections, each of which 
has the unit capacity. There are exactly n connections. A 
sender with a high capacity can divide its capacity into 
multiple unit capacities. 

> We consider the case that the failures happen in the 
relay nodes. The failures in the relay nodes might happen 
due to a failed switch, router, or any connecting point 
as shown in fig. Q] We assume that the failures are 
independent of each other. 

Definition 1 (Node Relay Degree): Let u be an arbitrary 
node in V = V\{5 U R}, which relays the traffic between 
source and terminal nodes. The number of connections passing 
through this node is called the node relay degree, and is 



referred to as d{u). Put differently: 

d(u) = I {L, : (u, w) G Li, Vu> G V, 1 < i < n} | . (2) 

Note that the above definition is different from the graph 
theoretic definition of the node degrees; input and output 
degrees. However, the node degree must not be less than the 
node relay degree. Furthermore, the node relay degree of a 
node u is d(u) < [n(u)/2\, where /i(u) is the degree of a 
node u in an undirected graph. 

We can define the network capacity from the min-cut max- 
flow information theoretic view [1]. It can be described as 
follows. 

Definition 2: The unit capacity of a connecting path L t 
between Sj and is defined by 



1, Li is active; 
0, otherwise. 



(3) 



The total capacity of Af is given by the summation of all path 
capacities. What we mean by an active path is that the receiver 
is able to receive and process packets throughout this path, see 
for further details [3]. 

Clearly, if all paths are active then the total capacity of 
all connections is 11 and the normalized capacity is 1. If 
we assume there are n disjoint paths, then, in general, the 
normalized capacity of the network for the active and failed 
paths is computed by 



1 n 
Cm = - V 1 

71 Z ^ 



(4) 



The working paths on a network with n connection paths 
carry traffic under normal operations, see Fig. [2] The Protec- 
tion paths provide an alternate backup path to carry the traffic 
in case of failures. A protection scheme ensures that data sent 
from the sources will reach the receivers in case of failure 
incidences on the working paths [2], [4]. 

III. Protection Against A Single Node Failure 

In this section we demonstrate a model for network pro- 
tection against a single node failure (SNF) using network 
coding. Previous work focused on network protection against 
single and multiple link failures using rerouting and sending 
packets throughout different links. We use network coding and 
reduced capacity on the paths carrying data from the sources 
to destinations. The idea has been developed for the purpose 
of link and path failures in [2], [7]. We present a protection 
strategy denoted by NPS-t. Under NPS-t, the normalized 
network capacity is based on the max-flow between sources 
and destinations, and its given by (?i — t)/n, where t is the 
maximum number of connections traversing any node in the 
network, i.e. in other words, it is the max node degree. We 
develop the design methodology of this strategy. In addition 
we derive bounds on the field size and encoding operations. 

Assume we have the same definitions as shown in the 
previous section. Let d(u) be the relay node degree of a node 
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Protectioit 1 • •"" ^ 
path 

Fig. 2. Network protection against a single path failure using reduced 
capacity and network coding. One path out of n primary paths carries encoded 
data. The black points represent various other relay nodes 



u in V. We define do to be the max over all node's relay 
degrees in the network Af. 

do = maxd(u) (5) 

Note that d Q is the degree representing the max links that can 
fail, in other words it is the number of working paths that 
might fail due to the failure of a relay node. Let v be the node 
with relay degree do, and assume v to be the failed node. Our 
goal is to protect the network Af against this node failure. 
In fact do represents a set of failed connections caused by a 
failure of the node v in the network Af. Although the failure of 
v is represented by the failure of 2do links, each incoming link 
at v has a corresponding outgoing link, and if either, or both of 
these two links fail, the effect on the connection is the same. 
Therefore, our protection strategy is based on representing the 
node failure by the failure of d connections, and we therefore 
need to protect against do failed connections. 

A. NPS-t Protecting SNF with do = t and Achieving (n — t)/n 
Normalized Capacity 

Assume the sender Sj sends a message to the receiver n 
via the path Li. Also, assume without loss of generality that t 
disjoint working paths have failed due to the failure of a single 
node. Then, we describe protection code as shown in Scheme 
©. Under this protection scheme, n — t of the working paths 
will carry plain data units denoted by x*-"s, i.e. the data unit 
transmitted on working path j in round i. The remaining t 
paths will carry linear combinations, which are denoted by 
Ui's. They will be used to recover from data unit losses due 
to the failure. 

In general, yi is given by 

(i-l)t n 
i=l i=jt+l 

71 

for (j -l)t + l<£<jt,l<j < L-J. (7) 

We consider that the coefficients a\'s are taken from a finite 
field with q > 2 alphabets. Later in this section, we will show 
how to perform the encoding and decoding operations for the 
purpose of recovery from failures. In addition, we will derive 
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bounds on the field size in the next Section. The following 
Theorem gives the normalized capacity of NPS-t strategy. 

Theorem 3: Let n be the total number of disjoint connec- 
tions from sources to receivers. The capacity of NPS-t strategy 
against t path failures as a result of a single node failure is 
given by 

Cm = (n - t)/(n) (8) 

Proof: In NPS-t, there are t paths that will carry encoded 
data in each round time in a particular session. Without loss 
of generality, consider the case in which n/t is an integeiQ 
or assume that [n/t\. Therefore, there exists (n/t) rounds, in 
which the capacity is (n — t) in each round. Also, the capacity 
in the first round is n — t. Hence, we have 

C H = ^l=M-t) 
(n/t)n 

= ^ (9) 

n 

■ 

The advantages of NPS-t strategy described in Scheme (O 
are that: 

• The data is encoded and decoded online, and it will be 
sent and received in different rounds. Once the receivers 
detect failures, they are able to obtain a copy of the lost 
data without delay by querying the neighboring nodes 
with unbroken working paths. 

• The approach is suitable for applications that do not 
tolerate packet delay such as real-time applications, e.g., 
multimedia and TV transmissions. 

• %100 recovery against any single node failure is guar- 
anteed. In addition, up to t disjoint path failures can be 
recovered from. 

• Using this strategy, no extra paths are needed. This will 
make this approach more suitable for applications, in 

1 The general case in which n/t is not an integer can be accommodated by 
running the strategy for m[(n/t)] rounds, where m is the smallest integer 
such that ran mod t = 0. 
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which adding extra paths, or reserving links or paths just 
for protection, may not be feasible. 
• The encoding and decoding operations are linear, and the 
coefficients of the variables xl's are taken from a finite 
field with q > 2 elements. 

B. Encoding Operations 

Assume that each connection path Li (L) has a unit capacity 
from a source Sj (S) to a receiver r, (R). The data sent from 
the sources S to the receivers R is transmitted in rounds. Under 
NPS-t, in every round n — t paths are used to carry plain data 
(xf), and t paths are used to carry protected data units, there 
are t protection paths. Therefore, to treat all connections fairly, 
there will be [n/t\ rounds, and in each round the capacity is 
given by n-t. 

We consider the case in which all symbols a^'s belong to 
the same round. The first t sources transmit the first encoded 
data units y\, y%, . . . , yt, and in the second round, the next t 
sources transmit yt+i , Vt+2 > • • • j V2t> an d so on. All sources 
S and receiver R must keep track of the round numbers. Let 
ID Si and x Si be the ID and data initiated by the source Si. 
Assume the round time j in session 6 is given by iP s . Then the 
source Sj will send packet Si on the working path Li which 
includes 



Packet Si = (ID Si ,xj,t e s 



(10) 



Also, the source Sj, that transmits on the protection path, 



will send a packet packet Sj : 

Packet Sj = {ID S] ,y k ,t e s ), 



(11) 



where y k is defined in (0. Hence the protection paths are used 
to protect the data transmitted in round I, which are included in 
the x\ data units. The encoded data y k is computed in a simple 
way where source Sj, for example, will collect all sources' 
data units, and using proper coefficients, will compute the y k 
data units defined in Scheme (0. In this case every data unit 
x\ is multiplied by a unique coefficient a, £ F q . This will 
differentiate the encoded data y^s. So, we have a system of t 
independent equations at each round time that will be used to 
recover at most t unknown variables. 

C. Proper Coefficients Selection 

One way to select the coefficients aj's in each round such 
that we have a system of t linearly independent equations is 
by using the matrix H shown in ( fT2] >. Let q be the order of 
a finite field, and a be the root of unity in F q . Then we can 
use this matrix to define the coefficients of the senders as: 



H 



v*-l 



v 2(*-l) 



a 

y 2(n-l) 



(t_l)( n _l) 



(12) 



We have the following assumptions about the encoding 
operations as shown in Scheme (1151) . 



1) Clearly if we have one failure t = 1, then all coefficients 
will be one. The first sender will always choose the unit 
value in the first row. 

2) If we assume do = t, then the y±, y2, ■ ■ . , yt equations in 
the first round are written as: 



yi= x i' V2= a(l 



(13) 



i=t+l 



i=t+l 



^ a *(3-l) mod (fl-Dg.! } l<j< t (14) 



i=t+l 

Therefore, the scheme that describes the encoding operations 
in the first round for t link failures can be described as 
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(15) 



This Scheme gives the general theme to choose the coefficients 
at any particular round in any session. However, the encoded 
data j/j's are defined as shown in Equation ( fl4l ). In other words, 
for the first round in session one, the coefficients of the plain 
data xi, X21 ■ ■ ■ , Xt are set to zero. The scheme can be extended 
directly to any encoded data y k . 

D. Decoding Operations 

We know that the coefficients , af > • • • > o>n are elements of 
a finite field, F q , hence the inverses of these elements exist and 
they are unique. Once a node fails which causes t data units to 
be lost, and once the receivers receive t linearly independent 
equations, they can linearly solve these equations to obtain 
the t unknown data units. At one particular session j, we have 
three cases for the failures: 

i) All t link failures happened in the working paths, i.e. the 
working paths have failed to convey the messages x\ in 
round I. In this case, n — t equations will be received, t 
of which are linear combinations of n — t data units, and 
the remaining n — 2t are explicit Xi data units, for a total 
of n — t equations in n — t data units. In this case any t 
equations (packets) of the t encoded packets can be used 
to recover the lost data. 

ii) All t link failures happened in the protection paths at the 
failed node. In this case, the exact remaining n—t packets 
are working paths and they do not experience any failures. 
Therefore, no recovery operations are needed. 

iii) The third case is that the failure might happen in some 
working and protection paths simultaneously in one par- 
ticular round in a session. The recover can be done using 
any t protection paths as shown in case i. 



5 



IV. Bounds on the Finite Field Size, F q 

In this section we derive lower and upper bounds on the 
alphabet size required for the encoding and decoding opera- 
tions. In the proposed schemes we assume that unidirectional 
connections exist between the senders and receivers, which the 
information can be exchanged with little cost. The first result 
shows that the alphabet size required must be greater than the 
number of connections that carry unencoded data. 

Theorem 4: Let n be the number of disjoint connections in 
the network model J\f. Then the receivers are able to decode 
the encoded messages over F q and will recover from t > 2 
path failures passing through if 



q > n- t+ 1. 



(16) 



Also, if q = p r , then r < [log p (n + 1)]. The binary field is 
sufficient in case of a single path failure. 

Proof: We will prove the lower bound by construction. 
Assume a NPS-t at one particular time t\ in the round £ in a 
certain session S. The protection code of NPS-t against t path 
failures is given as 
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(17) 



Without loss of generality, the interpretation of Equa- 
tion ( fTTI i is as follows: 

i) The columns correspond to the senders S and rows 
correspond to t encoded data yi, j/2, • • • , Ut- 

ii) The first row corresponds to y\ if we assume the first 
round in session one. Furthermore, every row represents 
the coefficients of every senders at a particular round. 

iii) The column i represents the coefficients of the sender Sj 
through all protection paths L\, L2, ■ ■ ■ , L t . 

iv) Any element a 1 € F q appears once in a column and row, 
except in the first column and first row, where all elements 
are one's. All columns (rows) are linearly independent. 

Due to the fact that the t failures might occur at any t 
working paths of L = {li, L2, ■ ■ ■ , l n }> then we can not 
predict the t protection paths as well. This means that t 
out of the n columns do not participate in the encoding 
coefficients, because t paths will carry encoded data. We notice 
that removing any t out of the n columns in Equation (T% 
will result in n — t different coefficients in each column. 
Furthermore any t columns will give a /j, x /i square sub-matrix 
that has a full-rank, this will be proved in our extended work. 
Therefore the smallest finite field that satisfies this condition 
must have n — t + 1 elements. 

The upper bound comes from the case of no failures, hence 
q > (n + 1). Assume q is a prime, then the result follows. ■ 

if q = 2 r , then in general the previous bound can be stated 

as 



We defined the feasible solution for the encoding and 
decoding operations of NPS-t as the solution that has integer 
reachable upper bounds. 

Corollary 5: The protection code (T% always gives a fea- 
sible solution. 



V. Conclusions 

Protection against node and link failures are extensional in 
all communication networks. In this paper, we presented a 
model for network protection against a single node failure, 
which is equivalent to protection against t link failures, and 
can therefore be used to protect against t link failures. We 
demonstrated an implementation strategy for the proposed 
network protection scheme. The network capacity is estimated, 
and bounds on the network resources are established. Our fu- 
ture work will include approaches for deploying the proposed 
protection strategy. 
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